Skip to content

[Calibration] Add MoE Calibration Context #1596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jul 22, 2025
Merged

Conversation

dsikka
Copy link
Collaborator

@dsikka dsikka commented Jun 25, 2025

Summary:

  • Introduce an moe_calibration_context which during calibration, replaces MoE blocks with custom modules which are needed to properly calibrate all experts requiring data
  • The context can be optionally enabled through a new calibrate_moe_context argument which if set to True, will enable the context
  • Modules are replaced with new definitions defined in the prepare folder (shared with replace_modules_for_calibration)
  • This enables a second pathway for calibrating MoEs and other models that require updates to their modules to be compatible with llm-compressor:
  1. Replacing modules during calibration
  2. Replacing modules permanently (as done by replace_modules_for_calibration, previously called prepare_for_calibration).
  • Similar to replace_modules_for_calibration, a dictionary defining the replacement has been added: moe_context

Testing

  • Tested with a Qwen/Qwen3-30B-A3B NVFP4 example and added the example to the folder as well

Next Steps:

  • Definitions for updated the MoE modules are hardcoded atm - we want to expand and add additional parameters to have more control over the MoE forward pass, such as through parameters defined here: [MoE] Add MoE calibration options #1593 - this is especially important if we find a certain configuration results in optimal calibration
  • We may find it easier to refactor out calibration args into their own pydantic model and not put everything under datraset args

Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @dsikka, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces specialized support for calibrating Mixture-of-Experts (MoE) models within the llmcompressor framework. It enables model-specific adjustments during the calibration process, which is crucial for accurately quantizing these complex architectures. The changes ensure that MoE models like Qwen3 and DeepseekV3 can be properly handled, improving the overall effectiveness of quantization for these models.

Highlights

  • MoE Calibration Context: Introduced a new moe_calibration_context mechanism to apply model-specific modifications during the calibration phase for Mixture-of-Experts (MoE) models. This allows for specialized handling required by MoE architectures during quantization.
  • Model-Specific MoE Handling: Implemented specific context updates for Qwen3 MoE models (patching the top_k attribute of MLP modules) and DeepseekV3 models (replacing MLP modules with a specialized version) to ensure proper calibration behavior for these architectures.
  • Pipeline Integration: Integrated the calibrate_moe_context flag into the oneshot entrypoint and both the Independent and Sequential calibration pipelines. This enables conditional application of the MoE-specific calibration logic during the overall quantization process.
  • Qwen3 MoE Example: Added a new example script (examples/quantization_w4a4_fp4/qwen_30b_a2b.py) demonstrating how to quantize a Qwen3-30B-A3B MoE model using the new calibration context and the NVFP4 scheme.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a calibration context for Mixture-of-Experts (MoE) models, which is a great addition for handling this model architecture during quantization. The changes involve adding logic to activate all experts during calibration for supported models like Qwen3 and DeepseekV3, and plumbing this feature through the oneshot workflow.

I've identified a critical issue in the implementation that will cause crashes for non-MoE models. I've also pointed out a high-severity issue related to a hardcoded feature flag and a few medium-severity issues regarding code clarity and robustness. Addressing these points will significantly improve the quality and stability of this new feature.

@dsikka dsikka added the ready When a PR is ready for review label Jul 3, 2025
@dsikka dsikka marked this pull request as ready for review July 3, 2025 16:29
Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to use this for llama4 as well, or will that be a separate function?

@dsikka
Copy link
Collaborator Author

dsikka commented Jul 7, 2025

Is the plan to use this for llama4 as well, or will that be a separate function?

I think for Llama4, we may want to change the structure permanently, in which case we'd want to use the replace_modules_for_calibration so that we can also compress is correctly post calibration as well.

Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple nits

dsikka added 4 commits July 14, 2025 20:19
SUMMARY:
"please provide a brief summary"


TEST PLAN:
"please outline how the changes were tested"
Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to land as is

@dsikka dsikka merged commit 0123644 into main Jul 22, 2025
10 checks passed
@dsikka dsikka deleted the provide_moe_calibration_mode branch July 22, 2025 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready When a PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants